Skip to content

fix: set TTL on Celery pidbox reply keys to prevent Redis memory leak#559

Draft
bartmichalak wants to merge 1 commit intomainfrom
fix/celery-pidbox-ttl-redis-memory-leak
Draft

fix: set TTL on Celery pidbox reply keys to prevent Redis memory leak#559
bartmichalak wants to merge 1 commit intomainfrom
fix/celery-pidbox-ttl-redis-memory-leak

Conversation

@bartmichalak
Copy link
Contributor

@bartmichalak bartmichalak commented Mar 5, 2026

Description

The change adds two configuration parameters to Celery (control_queue_ttl=300, control_queue_expires=300) in core.py:54-55.

What these parameters do:

  • control_queue_ttl - sets a TTL (300s) on messages in control queues (pidbox reply keys), after which Redis automatically removes them
  • control_queue_expires - sets an expiration time (300s) on the control queues themselves, after which Redis deletes them

Problem:

Celery creates temporary pidbox reply keys in Redis every time a worker responds to control commands (e.g. celery inspect). Without a TTL, these keys persist in Redis indefinitely, causing a memory leak.

Impact on existing logic:

  • None - the change does not affect task routing, scheduling (beat), serialization, or any business logic
  • These parameters only apply to Celery's internal control communication mechanism (pidbox), used for commands like inspect, ping, shutdown
  • Tasks, queues (default, sdk_sync), and beat schedule remain unchanged
  • 300s is a safe value - control commands complete in seconds, so 5 minutes provides plenty of headroom

Checklist

General

  • My code follows the project's code style
  • I have performed a self-review of my code
  • I have added tests that prove my fix/feature works (if applicable)
  • New and existing tests pass locally

Backend Changes

You have to be in backend directory to make it work:

  • uv run pre-commit run --all-files passes

Frontend Changes

NA

Testing Instructions

Screenshots

NA

Additional Notes

NA

@railway-app
Copy link

railway-app bot commented Mar 5, 2026

🚅 Deployed to the open-wearables-pr-559 environment in Open Wearables

Service Status Web Updated (UTC)
celery-worker ✅ Success (View Logs) Mar 5, 2026 at 2:43 pm
frontend ✅ Success (View Logs) Web Mar 5, 2026 at 2:42 pm
celery-beat ✅ Success (View Logs) Mar 5, 2026 at 2:42 pm
backend ✅ Success (View Logs) Web Mar 5, 2026 at 2:42 pm
celery-flower ✅ Success (View Logs) Web Mar 5, 2026 at 2:42 pm

@coderabbitai
Copy link

coderabbitai bot commented Mar 5, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 53eb2af4-c89c-46f7-8bd4-2ac8164cd238

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/celery-pidbox-ttl-redis-memory-leak

Comment @coderabbitai help to get the list of available commands and usage tips.

@bartmichalak
Copy link
Contributor Author

@CodeRabbit could you confirm that these changes won't affect any logic related to background jobs, scheduled tasks, etc?

@coderabbitai
Copy link

coderabbitai bot commented Mar 5, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find core.py and related Celery config files
fd 'core.py' --exec cat {}

echo "---TASKS---"
fd 'tasks.py' --exec cat {}

echo "---CELERY---"
fd 'celery.py' --exec cat {}

echo "---BEAT---"
rg -n 'beat_schedule|CELERY_BEAT|celerybeat' --type py -C3

echo "---ROUTING---"
rg -n 'task_routes|CELERY_ROUTES|control_queue' --type py -C3

Length of output: 4575


@bartmichalak — Yes, I can confirm these changes are fully isolated from all background job and scheduling logic. Here's a breakdown based on the actual config in backend/app/integrations/celery/core.py:

What control_queue_ttl and control_queue_expires touch:
These two parameters exclusively govern Celery's internal pidbox control channel — the mechanism used for management commands like celery inspect, celery ping, and celery shutdown. They have no interaction with task execution or scheduling.

What remains completely unchanged:

Concern Config key Status
Task queues task_queues (default, sdk_sync) ✅ Untouched
Task routing task_routes (process_sdk_uploadsdk_sync) ✅ Untouched
Beat schedule sync-all-users-periodic, finalize-stale-sleeps-periodic, gc-stuck-garmin-backfills ✅ Untouched
Result storage result_backend, result_expires (3 days) ✅ Untouched
Serialization task_serializer, accept_content ✅ Untouched
Broker broker_url ✅ Untouched

The 300s TTL is also a safe window — control commands complete in milliseconds to seconds, so there is no risk of evicting an in-flight reply. The change is a net positive: it prevents orphaned pidbox reply keys from accumulating in Redis indefinitely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: set TTL on Celery pidbox reply keys to prevent Redis memory leak

2 participants